Estimation of GMM in voice conver
نویسنده
چکیده
Voice conversion consists in transforming a source speaker voice into a target speaker voice. There are many applications of voice conversion systems where the amount of training data from the source speaker and the target speaker is different. Usually, the amount of source data available is large, but it is desired to estimate the transformation with a small amount of target data. Systems based on joint Gaussian Mixture Models (GMM) are well suited to voice conversion [1], but they can’t deal with source data without its corresponding aligned target data. In this paper, two alternatives are studied to incorporate unaligned source data in the estimation of a GMM for a voice conversion task. It is shown that when a limited amount of aligned parameters are available in the training step, to only include data from the source speaker increases the performance of the voice transformation.
منابع مشابه
Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملGMM Classifier for Identification of Neurological Disordered Voices Using MFCC Features
Automatic detection of neurological disordered subjects voice mostly relies on parameters extracted from time-domain processing. The calculation of these parameters often requires prior pitch period estimation; which in turn depends heavily on the robustness of pitch detection algorithm. In the present work cepstraldomain processing technique which does not require pitch estimation has been ado...
متن کاملVoice activity detection using global soft decision with mixture of Gaussian model
An improvement on the voice detection algorithm using global soft decision (GSD) is made in this paper. In GSD method, the speech and noise are modelled by the presumed probability density function, e.g. Gaussian pdf. We propose that the estimation and modelling of the signal is done in the domain of filterbank output which widely used in most speech processing applications. Since the output of...
متن کاملSpeech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering
Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...
متن کاملMaximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
The performance of voice conversion has been considerably improved through statistical modeling of spectral sequences. However, the converted speech still contains traces of artificial sounds. To alleviate this, it is necessary to statistically model a source sequence as well as a spectral sequence. In this paper, we introduce STRAIGHT mixed excitation to a framework of the voice conversion bas...
متن کامل